fix(evmrpc): cap StoreTracer access log to prevent debug_traceStateAccess issue (PLT-360)#3653
fix(evmrpc): cap StoreTracer access log to prevent debug_traceStateAccess issue (PLT-360)#3653amir-deris wants to merge 4 commits into
Conversation
PR SummaryMedium Risk Overview Per-module access logs are now limited to 4 MiB of retained key/value payload (plus 64 bytes overhead per retained event). All store trace events go through Adds tests for bounded retention on large iterator scans and for accurate stats when the log is truncated. Reviewed by Cursor Bugbot for commit c9db9ee. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
There was a problem hiding this comment.
A focused, well-reasoned fix that caps the per-module access log to stop debug_traceStateAccess from growing memory unboundedly, while accumulating op stats at record time so they stay accurate under truncation. No correctness issues found; the main gaps are a claimed-but-missing test and transient allocation that the cap doesn't bound.
Findings: 0 blocking | 4 non-blocking | 1 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- The PR description claims a
TestStoreTracerStatsAccurateWhenTruncatedtest, butsei-cosmos/types/tracer_test.gocontains onlyTestStoreTracerBoundsIteratorScanMemory. The stats-accuracy-under-truncation behavior is the subtlest part of this change and feedshistoricalLookupNanos(evmrpc/trace_profile.go:156-164), which reads the top-levelStatsmap — please add the missing test so this guarantee is covered (e.g. drive past the cap and assert per-op Count/TotalNanos stay exact whileTruncatedis set andReadsis bounded). - Cursor's second-opinion review (cursor-review.md) was empty — no findings from that pass were available to merge.
- Edge case worth a one-line note: a single access whose payload alone exceeds
maxStoreTraceModuleBytestrips truncation on the very first access (nothing retained, module flaggedTruncated). This is acceptable given the bounding goal, but is a slightly surprising consequence of checkingaccessBytes+cost > capbefore retaining anything. - 1 suggestion(s)/nit(s) flagged inline on specific lines.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3653 +/- ##
==========================================
- Coverage 58.94% 58.20% -0.74%
==========================================
Files 2262 2178 -84
Lines 187031 177384 -9647
==========================================
- Hits 110243 103248 -6995
+ Misses 66847 64994 -1853
+ Partials 9941 9142 -799
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit b08adc1. Configure here.
There was a problem hiding this comment.
A focused, well-tested fix that caps the StoreTracer per-module access log to bound memory in debug_traceStateAccess while keeping op stats exact under truncation. The core logic (unconditional stats, prefix-retention prestate, per-access overhead charge) is correct; no blocking issues found.
Findings: 0 blocking | 3 non-blocking | 0 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- The cap is per-module with no global ceiling. Worst-case retained payload is ~4 MiB × (number of modules touched). Module count is finite so this is bounded, but since debug_traceStateAccess replays an entire block prefix into a single shared tracer, a tx fanning across many modules can still retain tens of MiB. Consider a global byte budget across modules if the OOM was driven by module fan-out rather than a single huge scan.
- Cursor's second-opinion review (cursor-review.md) was empty — no output produced.
- Codex's second-opinion review reported no findings but noted it could not run
go test ./sei-cosmos/typesbecause the Go toolchain download was network-blocked, so its pass was not test-validated.
Comments that couldn't be anchored to the diff
sei-cosmos/types/tracer.go:213-- [nit] Minor inconsistency (not a bug): aftermt.Truncatedis set,RecordIteratorValuestill appends toit.Keys(bounded at 64) even though the access log is frozen. Iterator key samples are bounded separately, so this is harmless, but the per-iterator sample can keep growing for already-opened iterators after the module log is marked Truncated — worth a one-line note if the intent was to freeze all sampling at truncation.
There was a problem hiding this comment.
A focused, correct fix that caps the per-module StoreTracer access log at 4 MiB (plus a 64-byte per-access overhead charge) while accumulating op stats at record time so they stay exact under truncation. The design, comments, and two new tests are solid; only minor non-blocking observations remain.
Findings: 0 blocking | 3 non-blocking | 0 posted inline
Blockers
- None at the file/PR level.
Non-blocking
- Behavioral nuance worth noting:
DerivePrestateToJson(evmrpc/tracers.go:567) feeds the debug_traceTransaction/traceStateAccessAppState. When a module isTruncated, theReadsprestate is now a bounded prefix, so a replay relying on that prestate may be incomplete. This is the intended memory-vs-completeness tradeoff and thetruncatedflag surfaces it, but consumers of the replayed prestate should ideally check the flag — out of scope here, but worth a follow-up. - Per-iterator sampled keys (
it.Keys, up to 16 iterators × 64 keys) and iterator Start/End clones are not counted againstmaxStoreTraceModuleBytes. The amount is small and bounded, so the per-module memory claim still holds, but the cap is on the access-log payload only, not strictly all retained per-module bytes. - Second-opinion passes: Codex reported no material findings (with a note it could not run tests due to a Go 1.24 vs required 1.25.6 toolchain mismatch);
cursor-review.mdwas empty, so no Cursor findings were available to merge.

Problem
debug_traceStateAccesscould add too much memory in a single call. The root cause is in the storeStoreTracer: the per-module access log (mt.Accesses) was uncapped. EveryGet/Set/Has/Delete/IteratorValue/IteratorNextcloned its key (and value) into that log, so a pathological tx — e.g. a large iterator scan — grew retained memory linearly with the number of keys touched. The existing 16-iterator / 64-key caps only bounded the per-iterator summary, not the raw access log.debug_traceStateAccessmakes it worse by replaying the entire block prefix into a single shared tracer.Fix
Bound the per-module access log while keeping the trace useful:
maxStoreTraceModuleBytes(4 MiB) plus a fixedperAccessOverheadBytes(64) charge per retained access — the overhead charge bounds high-count / near-zero-payload patterns (e.g. iteratorNext()floods), not just large values.ModuleTrace.record(...): it always updates running op stats, then retains the cloned key/value only while the module stays under the cap. Past the cap the module is flaggedTruncatedand further accesses are dropped from the log (first-N retention — the earliest reads are the prestate, so this is the correct truncation direction).Stats(per-op counts/durations) at record time instead of recomputing them from the log, so they stay exact even when the log is truncated. This matters because the profile report (historicalLookupNanos,trace_profile_report.go) consumes those stats.truncatedonModuleTraceDumpso consumers can see the log was bounded.Net effect: prestate
Reads/Has/iterator-key samples are bounded; op stats remain complete.Tests
TestStoreTracerBoundsIteratorScanMemory— drives a 100k-key scan and asserts the tracer retains ≤ 8 MiB (was ~51.5 MB before the fix, i.e. unbounded).TestStoreTracerStatsAccurateWhenTruncated— drivesGetpast the cap and asserts per-opCount/TotalNanosstay exact (in the internal stats, the per-module dump, and the top-levelStatsmap) whileTruncatedis set andReadsis bounded.Verified
go build/go vetclean onsei-cosmos/types,evmrpc/...,sei-db/tools/..., and downstream consumer tests pass (TestKeeperTestSuite/TestViewKeeperStoreTrace,gaskv,giga/deps/xbank).